Chapter 5 An Introduction to Analysing Spatial Patterns

In this practical you will learn how to begin to analyse patterns in spatial data. Using data you are already familiar with, in the first part of the practical, you will explore some techniques for analysing patterns of point data in R. Then, in the second part of the practial, you will explore spatial autocorrelation using ArcGIS.

5.1 Point Pattern Analysis

In this analysis we will analyse the patterns of Blue Plaques you downloaded from http://openplaques.org/ in the Week 5 practial. Normally we could download the XML directly from the Web using a package like xml, but the XML feeds from OpenPlaques are a little flaky, so we’ll read in the shapefile for Blue Plaques for London we created in week 5.

The question we want to answer is: “For any given London Borough, are the Blue Plaques within that borough distributed randomly or do they exhibit some kind of dispersed or clustered pattern?”

To answer this question, we will make use of some of the Point Pattern Analysis functions found in the spatstat package.

#first library a few packages that we will use during the practical
#note you may need to install them first...
library(spatstat)
library(sp)
library(rgeos)
library(maptools)
library(GISTools)
library(tmap)
library(sf)
library(geojsonio)
library(tmaptools)

5.1.1 Setting up Your Data

Now, assuming that you’ve got a copy of your London Boroughs shapefile (from week 1) in your new week 6 folder, along with a shapefile of your Blue Plaques; read in the data…

##First, get the London Borough Boundaries
EW <- geojson_read("http://geoportal.statistics.gov.uk/datasets/8edafbe3276d4b56aec60991cbddda50_2.geojson", what = "sp")

Pull out london using grep and the regex wildcard for’start of the string’ (^) to to look for the bit of the district code that relates to London (E09) from the ‘lad15cd’ column in the data slot of our spatial polygons dataframe

BoroughMap <- EW[grep("^E09",EW@data$lad15cd),]
#plot it using the base plot function
qtm(BoroughMap)

summary(BoroughMap)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
##         min        max
## x -0.510277  0.3340243
## y 51.286759 51.6918756
## Is projected: FALSE 
## proj4string :
## [+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0]
## Data attributes:
##       lad15cd                   lad15nm            lad15nmw     objectid  
##  E09000001: 1   Barking and Dagenham: 1                :33   Min.   :294  
##  E09000002: 1   Barnet              : 1   Abertawe     : 0   1st Qu.:302  
##  E09000003: 1   Bexley              : 1   Blaenau Gwent: 0   Median :310  
##  E09000004: 1   Brent               : 1   Bro Morgannwg: 0   Mean   :310  
##  E09000005: 1   Bromley             : 1   Caerdydd     : 0   3rd Qu.:318  
##  E09000006: 1   Camden              : 1   Caerffili    : 0   Max.   :326  
##  (Other)  :27   (Other)             :27   (Other)      : 0                
##  st_lengthshape   st_areashape      
##  Min.   : 8929   Min.   :  2897649  
##  1st Qu.:28384   1st Qu.: 26797942  
##  Median :37664   Median : 37628571  
##  Mean   :39255   Mean   : 47682317  
##  3rd Qu.:46679   3rd Qu.: 56413925  
##  Max.   :74641   Max.   :150125298  
## 
BNG = "+init=epsg:27700"
BoroughMapBNG <- spTransform(BoroughMap,BNG)

##Now get the location of all Blue Plaques in the City
BluePlaques <- geojson_read("https://s3.eu-west-2.amazonaws.com/openplaques/open-plaques-london-2018-04-08.geojson", what = "sp")
summary(BluePlaques)
## Object of class SpatialPointsDataFrame
## Coordinates:
##              min      max
## coords.x1 -0.477  0.21903
## coords.x2  0.000 51.67830
## Is projected: FALSE 
## proj4string :
## [+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0]
## Number of points: 2812
## Data attributes:
##        id         
##  Min.   :    1.0  
##  1st Qu.:  711.8  
##  Median : 6089.0  
##  Mean   :10622.0  
##  3rd Qu.:10358.2  
##  Max.   :49190.0  
##                   
##                                                                                                                                                                                                                                                        inscription  
##  Frank Matcham (1854-1920) theatre architect designed this theatre                                                                                                                                                                                           :   3  
##  Charlie Chester MBE                                                                                                                                                                                                                                         :   2  
##  Grimâ\200\231s Dyke                                                                                                                                                                                                                                               :   2  
##  Lillie Langtry 1852-1929 actress lived here                                                                                                                                                                                                                 :   2  
##  'Canons' gate pillars\r\nEntrance gate piers to 'Canons' from Old Turnpike Road\r\nBuilt in 1712 for James Brydges 1673-1744, 1st Duke of Chandos. House demolished 1740 and existing buildings constructed from salvaged materials. Piers refurbished 1998.:   1  
##  'Father' Henry Willis 1821-1901 organ builder lived here                                                                                                                                                                                                    :   1  
##  (Other)                                                                                                                                                                                                                                                     :2801
#now set up an EPSG string to help set the projection 
BNG = "+init=epsg:27700"
WGS = "+init=epsg:4326"
BluePlaquesBNG <- spTransform(BluePlaques, BNG)
summary(BluePlaquesBNG)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                  min      max
## coords.x1   505575.2 622211.7
## coords.x2 -5527598.3 199509.2
## Is projected: TRUE 
## proj4string :
## [+init=epsg:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717
## +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
## +ellps=airy
## +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894]
## Number of points: 2812
## Data attributes:
##        id         
##  Min.   :    1.0  
##  1st Qu.:  711.8  
##  Median : 6089.0  
##  Mean   :10622.0  
##  3rd Qu.:10358.2  
##  Max.   :49190.0  
##                   
##                                                                                                                                                                                                                                                        inscription  
##  Frank Matcham (1854-1920) theatre architect designed this theatre                                                                                                                                                                                           :   3  
##  Charlie Chester MBE                                                                                                                                                                                                                                         :   2  
##  Grimâ\200\231s Dyke                                                                                                                                                                                                                                               :   2  
##  Lillie Langtry 1852-1929 actress lived here                                                                                                                                                                                                                 :   2  
##  'Canons' gate pillars\r\nEntrance gate piers to 'Canons' from Old Turnpike Road\r\nBuilt in 1712 for James Brydges 1673-1744, 1st Duke of Chandos. House demolished 1740 and existing buildings constructed from salvaged materials. Piers refurbished 1998.:   1  
##  'Father' Henry Willis 1821-1901 organ builder lived here                                                                                                                                                                                                    :   1  
##  (Other)                                                                                                                                                                                                                                                     :2801
#plot the blue plaques in the city
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(BoroughMapBNG) +
  tm_polygons(col = NA, alpha = 0.5) +
tm_shape(BluePlaquesBNG) +
  tm_dots(col = "blue")

Now, you might have noticed that there is at least one Blue Plaque that falls outside of the Borough boundaries. Errant plaques will cause problems with our analysis, so we need to clip the plaques to the boundaries…First we’ll remove any Plaques with the same grid reference as this will cause problems later on in the analysis..

#remove duplicates
BluePlaquesBNG <- remove.duplicates(BluePlaquesBNG)

Now just select the points inside London - thanks to Robin Lovelace for posting how to do this one, very useful!

BluePlaquesSub <- BluePlaquesBNG[BoroughMapBNG,]
#check to see that they've been removed
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(BoroughMapBNG) +
  tm_polygons(col = NA, alpha = 0.5) +
tm_shape(BluePlaquesSub) +
  tm_dots(col = "blue")

From this point, we could try and carry out our analysis on the whole of London, but you might be waiting until next week for Ripley’s K to be calculated for this many points. Therefore to speed things up and to enable us to compare areas within London, we will select some individual boroughs. First we need to subset our SpatialPolygonsDataFrame to pull out a borough we are interested in. I’m going to choose Harrow as I know there are few enough points for the analysis to definitely work. If you wish, feel free to choose another borough in London and run the same analysis, but beware that if it happens that there are a lot of blue plaques in your borough, the analysis could fall over!!

#extract the borough
Borough <- BoroughMapBNG[BoroughMapBNG@data$lad15nm=="Harrow",]

#or as an sf object:
BoroughMapBNGSF <- st_as_sf(BoroughMapBNG)
BoroughSF <- BoroughMapBNGSF[BoroughMapBNGSF$lad15nm=="Harrow",]

#Check to see that the correct borough has been pulled out
tm_shape(Borough) +
  tm_polygons(col = NA, alpha = 0.5)

Next we need to clip our Blue Plaques so that we have a subset of just those that fall within the borough or interest

#clip the data to our single borough
BluePlaquesSub <- BluePlaquesBNG[Borough,]
#check that it's worked
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(Borough) +
  tm_polygons(col = NA, alpha = 0.5) +
tm_shape(BluePlaquesSub) +
  tm_dots(col = "blue")

We now have all of our data set up so that we can start the analysis using spatstat. The first thing we need to do is create an observation window for spatstat to carry out its analysis within — we’ll set this to the extent of the Harrow boundary

##now set a window as the borough boundary
window <- as.owin(Borough)
plot(window)

spatstat has its own set of spatial objects that it works with (one of the delights of R is that different packages are written by different people and many have developed their own data types) - it does not work directly with the SpatialPolygonsDataFrames, SpatialPointsDataFrames or sf objects that we are used to. For point pattern analysis, we need to create a point pattern (ppp) object.

#create a ppp object
BluePlaquesSub.ppp <- ppp(x=BluePlaquesSub@coords[,1],y=BluePlaquesSub@coords[,2],window=window)

Try to understand what the different elements in command above is doing. If you are unsure, you can run elements of the code, for example:

BluePlaquesSub@coords[,1]
##  [1] 514971.1 512466.7 514966.0 517339.4 512215.1 515694.1 512269.5
##  [8] 511792.6 515333.7 518598.3 515370.2 512335.7 511539.4 513371.5
## [15] 516746.3 515210.7 515093.2 515561.8 514805.8 513053.6 515166.2
## [22] 513750.4 512508.2 516451.0 514022.4 518187.5 516725.5 513392.3
## [29] 513008.4 514177.1 515300.0 514183.4 518560.0 512639.2 515491.4
## [36] 514789.8 519099.9 512346.3 512343.2

Have a look at the new ppp object

plot(BluePlaquesSub.ppp,pch=16,cex=0.5, main="Blue Plaques Harrow")

5.1.1.1 Summarising your point data - a little aside on Kernel Density Estimation

We will explore what Kernel Density Estimation (KDE) is in a couple of week’s, but here we are going to jump the gun a little bit. One way to summarise your point data is to plot the density of your points under a window called a ‘Kernel’. The size and shape of the Kernel affects the density pattern produced (more of this next week), but it is very easy to produce a KDE map from a ppp object using the density function.

plot(density(BluePlaquesSub.ppp, sigma = 500))

The sigma value sets the diameter of the Kernel (in the units your map is in - in this case, as we are in British National Grid the units are in metres). Try experimenting with different values of sigma to see how that affects the density estimate.

plot(density(BluePlaquesSub.ppp, sigma = 1000))

5.1.2 Quadrat Analysis

So as you saw in the lecture, we are interesting in knowing whether the distribution of points in our study area differs from ‘complete spatial randomness’ - CSR.

The most basic test of CSR is a quadrat analysis. We can carry out a simple quadrat analysis on our data using the quadrat count function in spatstat. Note, I wouldn’t recommend doing a quadrat analysis in any real piece of analysis you conduct, but it is useful for starting to understand the Poisson distribution…

#First plot the points
plot(BluePlaquesSub.ppp,pch=16,cex=0.5, main="Blue Plaques in Harrow")
#now count the points in that fall in a 6 x 6 grid overlaid across the window
plot(quadratcount(BluePlaquesSub.ppp, nx = 6, ny = 6),add=T,col="red")

In our case here, want to know whether or not there is any kind of spatial patterning associated with the Blue Plaques in areas of London. If you recall from the lecture, this means comparing our observed distribution of points with a statistically likely (Complete Spatial Random) distibution, based on the Poisson distribution.

Using the same quadratcount function again (for the same sized grid) we can save the results into a table:

#run the quadrat count
Qcount<-data.frame(quadratcount(BluePlaquesSub.ppp, nx = 6, ny = 6))
#put the results into a data frame
QCountTable <- data.frame(table(Qcount$Freq, exclude=NULL))
#view the data frame
QCountTable
##   Var1 Freq
## 1    0   12
## 2    1    8
## 3    2    4
## 4    3    1
## 5    4    2
## 6    5    1
## 7    7    1
#we don't need the last row, so remove it
QCountTable <- QCountTable[-nrow(QCountTable),]

Check the data type in the first column - if it is factor, we will need to convert it to numeric

class(QCountTable[,1])
## [1] "factor"
#oops, looks like it's a factor, so we need to convert it to numeric
vect<- as.numeric(levels(QCountTable[,1]))
vect <- vect[1:6]
QCountTable[,1] <- vect

OK, so we now have a frequency table - next we need to calculate our expected values. The formula for calculating expected probabilities based on the Poisson distribution is:

\[ Pr\left(X=k\right)=\frac{\lambda^{k}e^{-\lambda}}{k!} \]

#calculate the total blue plaques (Var * Freq)
QCountTable$total <- QCountTable[,1]*QCountTable[,2]
#calculate mean
sums <- colSums(QCountTable[,-1])
sums
##  Freq total 
##    28    32
#and now calculate our mean Poisson parameter (lambda)
lambda <- sums[2]/sums[1]

Calculate expected using the Poisson formula from above — k is the number of blue plaques counted in a square and is found in the first column of our table…

QCountTable$Pr <- ((lambda^QCountTable[,1])*exp(-lambda))/factorial(QCountTable[,1])
#now calculate the expected counts and save them to the table
QCountTable$Expected <- round(QCountTable$Pr * sums[1],0)
QCountTable
##   Var1 Freq total          Pr Expected
## 1    0   12     0 0.318906557        9
## 2    1    8     8 0.364464637       10
## 3    2    4     8 0.208265507        6
## 4    3    1     3 0.079339241        2
## 5    4    2     8 0.022668354        1
## 6    5    1     5 0.005181338        0
#Compare the frequency distributions of the observed and expected point patterns
plot(c(1,5),c(0,14), type="n", xlab="Number of Blue Plaques (Red=Observed, 
     Blue=Expected)", ylab="Frequency of Occurances")
points(QCountTable$Freq, col="Red", type="o", lwd=3)
points(QCountTable$Expected, col="Blue", type="o", lwd=3)

Comparing the observed and expected frequencies for our quadrat counts, we can observe that they both have higher frequency counts at the lower end - something reminiscent of a Poisson distribution. This could indicate that for this particular set of quadrats, our pattern is close to Complete Spatial Randomness (i.e. no clustering or dispersal of points). But how do we confirm this?

To check for sure, we can use the quadrat.test function, built into spatstat. This uses a Chi Squared test to compare the observed and expected frequencies for each quadrat (rather than for quadrat bins, as we have just computed above). If the p-value of our Chi-Squared test is > 0.05, then we can reject a null hyphothesis that says “there is no complete spatial randomness in our data” (we will learn more about what a null-hypothesis is in a couple of weeks, but for the time being, just think about it as the opposite of a hypothesis that says our data exhibit complete spatial randomness). What we need to look for is a value for p > 0.05. If our p-value is > 0.05 then this indicates that we have CSR and there is no pattern in our points. If it is < 0.05, this indicates that we do have clustering in our points.

teststats <- quadrat.test(BluePlaquesSub.ppp, nx = 6, ny = 6)
teststats
## 
##  Chi-squared test of CSR using quadrat counts
##  Pearson X2 statistic
## 
## data:  BluePlaquesSub.ppp
## X2 = 37.959, df = 28, p-value = 0.1984
## alternative hypothesis: two.sided
## 
## Quadrats: 29 tiles (irregular windows)
plot(BluePlaquesSub.ppp,pch=16,cex=0.5, main="Blue Plaques in Harrow")
plot(teststats, add=T, col = "red")

So we can see that the indications are there is no spatial patterning for Blue Plaques in Harrow - at least for this particular grid. Note the warning message - some of the observed counts are very small (0) and this may affect the accuracy of the quadrat test. Recall that the Poisson distribution only describes observed occurrances that are counted in integers - where our occurances = 0 (i.e. not observed), this can be an issue. We also know that there are various other problems that might affect our quadrat analysis, such as the modifiable areal unit problem.

In the new plot, we can see three figures for each quadrat. The top-left figure is the observed count of points; the top-right is the Poisson expected number of points; the bottom value is the Pearson residual value, or (Observed - Expected) / Sqrt(Expected).

5.1.2.1 Try experimenting…

Try running a quadrat analysis for different grid arrangements (2 x 2, 3 x 3, 10 x 10 etc.) - how does this affect your results?

5.1.3 Ripley’s K

One way of getting around the limitations of quadrat analysis is to compare the observed distribution of points with the Poisson random model for a whole range of different distance radii. This is what Ripley’s K function computes.

We can conduct a Ripley’s K test on our data very simply with the spatstat package using the kest function.

K <- Kest(BluePlaquesSub.ppp, correction="border")
plot(K)

The plot for K has a number of elements that are worth explaining. First, the Kpois(r) line in Red is the theoretical value of K for each distance window (r) under a Poisson assumption of Complete Spatial Randomness. The Black line is the estimated values of K accounting for the effects of the edge of the study area.

Where the value of K falls above the line, the data appear to be clustered at that distance. Where the value of K is below the line, the data are dispersed. From the graph, we can see that up until distances of around 1300 metres, Blue Plaques appear to be clustered in Harrow, however, at around 1500 m, the distribution appears random and then dispersed between about 1600 and 2100 metres.

5.1.3.1 Alternatives to Ripley’s K

There are a number of alternative measures of spatial clustering which can be computed in spatstat such as the G and the L functions - I won’t go into them now, but if you are interested, you should delve into the following references:

Bivand, R. S., Pebesma, E. J., & Gómez-Rubio, V. (2008). “Applied spatial data analysis with R.” New York: Springer.

Brundson, C., Comber, L., (2015) “An Introduction to R for Spatial Analysis & Mapping”. Sage.

https://research.csiro.au/software/wp-content/uploads/sites/6/2015/02/Rspatialcourse_CMIS_PDF-Standard.pdf

5.1.4 Density-based spatial clustering of applications with noise: DBSCAN

Quadrat and Ripley’s K analysis are useful exploratory techniques for telling us if we have spatial clusters present in our point data, but they are not able to tell us WHERE in our area of interest the clusters are occurring. To discover this we need to use alternative techniques. One popular technique for discovering clusters in space (be this physical space or variable space) is DBSCAN. For the complete overview of the DBSCAN algorithm, read the original paper by Ester et al. (1996) - http://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf or consult the wikipedia page - https://en.wikipedia.org/wiki/DBSCAN

library(raster)
library(fpc)
library(plyr)
library(OpenStreetMap)

We will now carry out a DBSCAN analysis of blue plaques in my borough to see if there are any clusters present.

#first check the coordinate reference system of the Harrow spatial polygon:
crs(Borough)
## CRS arguments:
##  +init=epsg:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717
## +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
## +ellps=airy
## +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894

DBSCAN requires you to input two parameters: 1. Epsilon - this is the radius within which the algorithm with search for clusters 2. MinPts - this is the minimum number of points that should be considered a cluster

Based on the results of the Ripley’s K analysis earlier, we can see that we are getting clustering up to a radius of around 1200m, with the largest bulge in the graph at around 700m. Therefore, 700m is probably a good place to start and we will begin by searching for clusters of at least 4 points…

#first extract the points from the spatial points data frame
BluePlaquesSubPoints <- data.frame(BluePlaquesSub@coords[,1:2])
#now run the dbscan analysis
db <- fpc::dbscan(BluePlaquesSubPoints, eps = 700, MinPts = 4)
#now plot the results
plot(db, BluePlaquesSubPoints, main = "DBSCAN Output", frame = F)
plot(Borough, add=T)

#dbscan::kNNdistplot(BluePlaquesSubPoints, k =  4)

So the DBSCAN analysis shows that for these values of eps and MinPts there are three clusters in the area I am analysing. Try varying eps and MinPts to see what difference it makes to the output.

No of course the plot above is a little basic and doesn’t look very aesthetically pleasing. As this is R and R is brilliant, we can always produce a much nicer plot by extracting the useful information from the DBSCAN output and use ggplot2 to produce a much cooler map…

#our new db object contains lots of info including the cluster each set of point #coordinates belongs to, whether the point is a seed point or a border point etc. #We can get a summary by just calling the object

library(ggplot2)

Our new db object contains lots of info including the cluster each set of point coordinates belongs to, whether the point is a seed point or a border point etc. We can get a summary by just calling the object

db
## dbscan Pts=39 MinPts=4 eps=700
##         0 1 2 3 4
## border 15 0 1 3 4
## seed    0 8 6 1 1
## total  15 8 7 4 5

If you open up the object in the environment window in RStudio, you will also see the various slots in the object, including cluster

db$cluster
##  [1] 2 1 2 0 1 0 1 0 2 3 2 1 0 4 0 2 0 0 0 4 2 4 1 0 4 3 0 4 0 0 2 0 3 1 0
## [36] 0 3 1 1

We can now add this cluster membership info back into our dataframe

BluePlaquesSubPoints$cluster <- db$cluster

Next we are going to create some convex hull polygons to wrap around the points in our clusters. Use the ddply function in the plyr package to get the convex hull coordinates from the cluster groups in our dataframe

chulls <- ddply(BluePlaquesSubPoints, .(cluster), 
                function(df) df[chull(df$coords.x1, df$coords.x2), ])

As 0 isn’t actually a cluster (it’s all points that aren’t in a cluster) drop it from the dataframe

chulls <- subset(chulls, cluster>=1)

Now create a ggplot2 object from our data

dbplot <- ggplot(data=BluePlaquesSubPoints, 
                 aes(coords.x1,coords.x2, colour=cluster, fill=cluster)) 
#add the points in
dbplot <- dbplot + geom_point()
#now the convex hulls
dbplot <- dbplot + geom_polygon(data = chulls, 
                                aes(coords.x1,coords.x2, group=cluster), 
                                alpha = 0.5) 
#now plot, setting the coordinates to scale correctly and as a black and white plot 
#(just for the hell of it)...
dbplot + theme_bw() + coord_equal()

Now we are getting there, but wouldn’t it be better to add a basemap?!

###add a basemap
##First get the bbox in lat long for Harrow
latlong <- "+init=epsg:4326" 
BoroughWGS <-spTransform(Borough, CRS(latlong))
BoroughWGS@bbox
##          min        max
## x -0.4040502 -0.2671315
## y 51.5530624 51.6405356

Now convert the basemap to British National Grid

basemap<-openmap(c(51.5530613,-0.4040719),c(51.6405318,-0.2671556), zoom=NULL,"stamen-toner")
#convert the basemap to British National Grid - remember we created the BNG object right at the beginning of the practical - it's an epsg string...
basemap_bng<-openproj(basemap, projection=BNG)

Now we can plot our fancy map with the clusters on…

autoplot(basemap_bng) + geom_point(data=BluePlaquesSubPoints, 
                                   aes(coords.x1,coords.x2, 
                                       colour=cluster, fill=cluster)) + 
  geom_polygon(data = chulls, aes(coords.x1,coords.x2, group=cluster, fill=cluster), 
               alpha = 0.5)  

This is end end of the point pattern analysis section of the practical. You have been introduced to the basics of Point Pattern Analysis examining the distribution of Blue Plaques in a London Borough. At this point, you may wish to try running similar analyses on different boroughs (or indeed the whole city) and playing with some of the outputs - although you will find that Ripley’s K will fall over very quickly if you try to run the analysis on that many points)

This how you might make use of these techniques in another context or with different point data…

5.2 Analysing Spatial Autocorrelation with Moran’s I, LISA and friends

Now, at this point you have a choice and at bit like in those Fighting Fantasy (https://en.wikipedia.org/wiki/Fighting_Fantasy) books that I used to read as a kid, you can select either option 1 (which may lead to firey death by dragon) or option 2 (which could lead to a pot of gold)…

Option 1. If you’ve had enough of coding and you think you might like to do your coursework in ArcGIS and have a bit more practice with model builder, then you should go over to Moodle and carry out the exercise on the ‘Practical 6 - Part 2’ handout that is on there. This will show you how to analyse spatial autocorrelation in ArcGIS and will give you more practice will model builder.

Option 2. If you are a total bad-ass and want to continue with R, then briliant!! Woo hoo!! You can keep following the instuctions below.

Have the Arc lot gone? OK great, here’s some more lovely R…

In this section we are going to explore patterns of spatially referenced continuous observations using various measures of spatial autocorrelation. Check out the various references in the reading list for more information about these methods. There are also useful links in the helpfile of the spdep package which we will be using here.

Before we get any further, let’s get some ward boundaries read in to R - download LondonWardData from Moodle, unzip it and then read it in. #it’s probably projected correctly, but in case it isn’t give it a projection #using the CRS() function in the raster package

library(rgdal)
## rgdal: version: 1.4-4, (SVN revision 833)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20
##  Path to GDAL shared files: C:/Users/ucfnmac/Documents/R/win-library/3.6/rgdal/gdal
##  GDAL binary built with GEOS: TRUE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: C:/Users/ucfnmac/Documents/R/win-library/3.6/rgdal/proj
##  Linking to sp version: 1.3-1
#read the ward data in
LondonWards <- readOGR("prac6_data/LondonWards.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\ucfnmac\OneDrive - University College London\Teaching\CASA0005repo\prac6_data\LondonWards.shp", layer: "LondonWards"
## with 625 features
## It has 77 fields

Tt’s probably projected correctly, but in case it isn’t give it a projection using the CRS() function in the raster package

proj4string(LondonWards) <- CRS("+init=epsg:27700")
## Warning in `proj4string<-`(`*tmp*`, value = new("CRS", projargs = "+init=epsg:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894")): A new CRS was assigned to an object with an existing CRS:
## +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs +ellps=airy +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894
## without reprojecting.
## For reprojection, use function spTransform
#have a look to check that everything looks OK..

tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(LondonWards) +
  tm_polygons(col = NA, alpha = 0.5) +
tm_shape(BluePlaques) +
  tm_dots(col = "blue")

Ah yes, we might need to lose the blue plaques that fall outside of London

summary(BluePlaquesBNG)
## Object of class SpatialPointsDataFrame
## Coordinates:
##                  min      max
## coords.x1   505575.2 622211.7
## coords.x2 -5527598.3 199509.2
## Is projected: TRUE 
## proj4string :
## [+init=epsg:27700 +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717
## +x_0=400000 +y_0=-100000 +datum=OSGB36 +units=m +no_defs
## +ellps=airy
## +towgs84=446.448,-125.157,542.060,0.1502,0.2470,0.8421,-20.4894]
## Number of points: 2711
## Data attributes:
##        id         
##  Min.   :    1.0  
##  1st Qu.:  698.5  
##  Median : 6154.0  
##  Mean   :10724.9  
##  3rd Qu.:10397.0  
##  Max.   :49190.0  
##                   
##                                                                                                                                                                                                                                                        inscription  
##  Frank Matcham (1854-1920) theatre architect designed this theatre                                                                                                                                                                                           :   3  
##  Grimâ\200\231s Dyke                                                                                                                                                                                                                                               :   2  
##  Lillie Langtry 1852-1929 actress lived here                                                                                                                                                                                                                 :   2  
##  'Canons' gate pillars\r\nEntrance gate piers to 'Canons' from Old Turnpike Road\r\nBuilt in 1712 for James Brydges 1673-1744, 1st Duke of Chandos. House demolished 1740 and existing buildings constructed from salvaged materials. Piers refurbished 1998.:   1  
##  'Father' Henry Willis 1821-1901 organ builder lived here                                                                                                                                                                                                    :   1  
##   J. H. Greathead   Chief Engineer City and South London Railway. Inventor of the Travelling Shield that made possible the cutting of the tunnels of London's deep level tube system                                                                         :   1  
##  (Other)                                                                                                                                                                                                                                                     :2701
BluePlaquesSub <- BluePlaquesBNG[LondonWards,]
tm_shape(LondonWards) +
  tm_polygons(col = NA, alpha = 0.5) +
tm_shape(BluePlaquesSub) +
  tm_dots(col = "blue")

The measures of spatial autocorrelation that we will be using require continuous observations (counts of blue plaques, average GCSE scores, average incomes etc.) to be spatially referenced (i.e. attached to a spatial unit like a ward or a borough). The file you have already has the various obervations associated with the London Ward data file already attached to it, but let’s continue with our blue plaques example for now.

To create a continuous observation from the blue plaques data we need to count all of the blue plaques that fall within each Ward in the City. Luckily, we can do this using the poly.counts function in Chris Brunsdon’s excellent gistools package… (we could also use the over() function in the sp package if we wanted)

res <- poly.counts(BluePlaquesSub, LondonWards)
#and add this as a column in our spatialPolygonsDataframe
LondonWards@data$BluePlaqueCount<-res
#as the wards are of different sizes, perhaps best that we calculate a density
LondonWards@data$BlueDensity <- LondonWards$BluePlaqueCount/poly.areas(LondonWards)
#let's just check the data to see if the calculations have worked
LondonWards@data

How about a quick choropleth map to see how we are getting on…

tm_shape(LondonWards) +
    tm_polygons("BlueDensity",
        style="jenks",
        palette="PuOr",
        midpoint=NA,
        title="Blue Plaque Density")

So, from the map, it looks as though we might have some clustering of blue plaques in the centre of London so let’s check this with Moran’s I and some other statistics.

Before being able to calculate Moran’s I and any similar statistics, we need to first define a \(W_{ij}\) spatial weights matrix

library(spdep)

First calculate the centroids of all Wards in London

#First calculate the centroids of all Wards in London
coordsW <- coordinates(LondonWards)
plot(coordsW)

Now we need to generate a spatial weights matrix (remember from the lecture). We’ll start with a simple binary matrix of queen’s case neighbours

#create a neighbours list
LWard_nb <- poly2nb(LondonWards, queen=T)
#plot them
plot(LWard_nb, coordinates(coordsW), col="red")
#add a map underneath
plot(LondonWards, add=T)

#create a spatial weights object from these weights
Lward.lw <- nb2listw(LWard_nb, style="C")
head(Lward.lw$neighbours)
## [[1]]
## [1]   6   7  10 462 468 482
## 
## [[2]]
## [1]  5  8  9 11 12 13 16
## 
## [[3]]
## [1]  10  11  12  15 480 483
## 
## [[4]]
## [1]  17 281 291 470 473 481
## 
## [[5]]
## [1]   2   9  16 281 283 290
## 
## [[6]]
## [1]  1  7  8 10 11 14

Now we have defined our \(W_{ij}\) matrix, we can calculate the Moran’s I and other associated statistics

Moran’s I test tells us whether we have clustered values (close to 1) or dispersed values (close to -1), we will calculate for the densities rather than raw values

I_LWard_Global_Density <- moran.test(LondonWards@data$BlueDensity, Lward.lw)
I_LWard_Global_Density
## 
##  Moran I test under randomisation
## 
## data:  LondonWards@data$BlueDensity  
## weights: Lward.lw    
## 
## Moran I statistic standard deviate = 30.211, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##      0.6726785503     -0.0016025641      0.0004981274

Geary’s C as well..? This tells us whether similar values or dissimilar values are clusering

C_LWard_Global_Density <- geary.test(LondonWards@data$BlueDensity, Lward.lw)
C_LWard_Global_Density
## 
##  Geary C test under randomisation
## 
## data:  LondonWards@data$BlueDensity 
## weights: Lward.lw 
## 
## Geary C statistic standard deviate = 8.3637, p-value < 2.2e-16
## alternative hypothesis: Expectation greater than statistic
## sample estimates:
## Geary C statistic       Expectation          Variance 
##       0.403108506       1.000000000       0.005093209

Getis Ord General G…? This tells us whether high or low values are clustering. If G > Expected = High values clustering; if G < expected = low values clustering

G_LWard_Global_Density <- globalG.test(LondonWards@data$BlueDensity, Lward.lw)
G_LWard_Global_Density
## 
##  Getis-Ord global G statistic
## 
## data:  LondonWards@data$BlueDensity 
## weights: Lward.lw 
## 
## standard deviate = 29.488, p-value < 2.2e-16
## alternative hypothesis: greater
## sample estimates:
## Global G statistic        Expectation           Variance 
##       1.123988e-02       1.602564e-03       1.068124e-07

So the global statistics are indicating that we have spatial autocorrelation of Blue Plaques in London:

The Moran’s I statistic = 0.51 (remember 1 = clustered, 0 = no pattern, -1 = dispersed) which shows that we have some distinctive clustering

The Geary’s C statistic = 0.41 (remember Geary’s C falls between 0 and 2; 1 means no spatial autocorrelation, <1 - positive spatial autocorrelation or similar values clustering, >1 - negative spatial autocorreation or dissimilar values clustering) which shows that similar values are clustering

The General G statistic = G > expected, so high values are tending to cluster.

We can now also calculate local versions of the Moran’s I statistic (for each Ward) and a Getis Ord \(G_{i}^{*}\) statistic to see where we have hot-spots…

#use the localmoran function to generate I for each ward in the city
I_LWard_Local <- localmoran(LondonWards@data$BluePlaqueCount, Lward.lw)
I_LWard_Local_Density <- localmoran(LondonWards@data$BlueDensity, Lward.lw)
#what does the output (the localMoran object) look like?
head(I_LWard_Local_Density)
##           Ii         E.Ii    Var.Ii      Z.Ii Pr(z > 0)
## 0 0.08893069 -0.001632161 0.1585720 0.2274243 0.4100469
## 1 0.13238172 -0.001904187 0.1847261 0.3124398 0.3773532
## 2 0.10800337 -0.001632161 0.1585720 0.2753202 0.3915351
## 3 0.11771065 -0.001632161 0.1585720 0.2996974 0.3822040
## 4 0.11771065 -0.001632161 0.1585720 0.2996974 0.3822040
## 5 0.11213389 -0.001632161 0.1585720 0.2856929 0.3875567

There are 5 columns of data. We want to copy some of the columns (the I score (column 1) and the z-score standard deviation (column 4)) back into the LondonWards spatialPolygonsDataframe

LondonWards@data$BLocI <- I_LWard_Local[,1]
LondonWards@data$BLocIz <- I_LWard_Local[,4]
LondonWards@data$BLocIR <- I_LWard_Local_Density[,1]
LondonWards@data$BLocIRz <- I_LWard_Local_Density[,4]

No we can plot a map of the local Moran’s I outputs…

We’ll set the breaks manually based on the rule that data points >2.58 or <-2.58 standard deviations away from the mean are significant at the 99% level (<1% chance that autocorrelation not present); >1.96 - <2.58 or <-1.96 to >-2.58 standard deviations are significant at the 95% level (<5% change that autocorrelation not present). >1.65 = 90% etc.

#create a new diverging colour brewer palette and reverse the order using rev so higher values correspond to red

breaks1<-c(-1000,-2.58,-1.96,-1.65,1.65,1.96,2.58,1000)

Now create a new diverging colour brewer palette and reverse the order using rev so higher values correspond to red

MoranColours<- rev(brewer.pal(8, "RdGy"))

Plot on an interactive map

tm_shape(LondonWards) +
    tm_polygons("BLocIRz",
        style="fixed",
        breaks=breaks1,
        palette=MoranColours,
        midpoint=NA,
        title="Local Moran's I, Blue Plaques in London")

This map shows some areas in the centre of London that have relatively high scores, indicating areas with lots of blue plaques neighbouring other areas with lots of blue plaques.

What about the Getis Ord \(G_{i}^{*}\) statisic for hot and cold spots?

Gi_LWard_Local_Density <- localG(LondonWards@data$BlueDensity, Lward.lw)

Check the help file (?localG) to see what a localG object looks like - it is a bit different from a localMoran object as it only contains just a single value - the z-score (standardised value relating to whether high values or low values are clustering together) And map the outputs…

head(Gi_LWard_Local_Density)
## [1] -0.8364891 -0.8720954 -0.7679474 -0.8368497 -0.8368497 -0.7972659
LondonWards@data$BLocGiRz <- Gi_LWard_Local_Density

And map the outputs…

GIColours<- rev(brewer.pal(8, "RdBu"))

#now plot on an interactive map
tm_shape(LondonWards) +
    tm_polygons("BLocGiRz",
        style="fixed",
        breaks=breaks1,
        palette=GIColours,
        midpoint=NA,
        title="Gi*, Blue Plaques in London")

The local Moran’s I and \(G_{i}^{*}\) statistics for wards clearly show that the density of blue plaques in the centre of the city exhibits strong (and postitive) spatial autocorrelation, but neither of these maps are very interesting. The ArcGIS crew will have been calculating Local Moran’s I and \(G_{i}^{*}\) statistics for some of the other variables for the wards of London. Why not try some alternative variables and see what patterns emerge… here I’m going to have a look at Average GSCE scores…

#use head to see what other variables are in the data file
head(LondonWards@data)
##      WD11CD WD11CDO         WD11NM WD11NMW
## 0 E05000026  00ABFX          Abbey    <NA>
## 1 E05000027  00ABFY         Alibon    <NA>
## 2 E05000028  00ABFZ      Becontree    <NA>
## 3 E05000029  00ABGA Chadwell Heath    <NA>
## 4 E05000030  00ABGB      Eastbrook    <NA>
## 5 E05000031  00ABGC       Eastbury    <NA>
##                                WardName OldCode  Wardcode Pop2013 Aged0_15
## 0          Barking and Dagenham - Abbey  00ABFX E05000026   13650     3450
## 1         Barking and Dagenham - Alibon  00ABFY E05000027   10400     2700
## 2      Barking and Dagenham - Becontree  00ABFZ E05000028   12050     3000
## 3 Barking and Dagenham - Chadwell Heath  00ABGA E05000029   10150     2450
## 4      Barking and Dagenham - Eastbrook  00ABGB E05000030   10600     2150
## 5       Barking and Dagenham - Eastbury  00ABGC E05000031   11700     3000
##   Aged16_64 Aged65plus PctAged0_1 PctAged16_ PctAged65p MeanAge201
## 0      9550        700       25.3       70.0        5.1       29.5
## 1      6600       1100       26.0       63.5       10.6       33.8
## 2      8000       1100       24.9       66.4        9.1       33.0
## 3      6150       1550       24.1       60.6       15.3       36.2
## 4      6900       1550       20.3       65.1       14.6       37.7
## 5      7550       1150       25.6       64.5        9.8       33.4
##   MedianAge2 AreaSqKM PopDensity PctBame PctNotBorn PctNoEngli GenFertRat
## 0         29      1.3    10500.0    71.9       57.3       25.7        103
## 1         33      1.4     7428.6    29.9       24.7        7.9         91
## 2         32      1.3     9269.2    41.2       30.1       10.5        102
## 3         34      3.4     2985.3    37.9       24.8        6.5         81
## 4         36      3.5     3028.6    24.8       19.0        4.5         78
## 5         32      1.4     8357.1    41.7       32.2       11.9         89
##   MaleLE0509 FemaleLE05   PctYr1Obes   PctYr6Obes RateAmbula RatesAmbul
## 0       80.0       82.2 15.423728814 21.467391304   164.6886  0.9352518
## 1       75.8       80.4         12.5 25.853658537   134.5192  0.7837838
## 2       78.9       78.9 12.857142857 24.874371859   127.8838  0.7010309
## 3       79.1       81.0 11.751662971 25.816993464   149.5567  0.6000000
## 4       77.1       80.6 13.942307692 24.324324324   145.0000  0.6666667
## 5       78.6       84.6 13.621262458 22.608695652   124.2735  0.4666667
##   RoadKilled InEmployme Employment NoJobs2011 EmpWkAgePo RateNINoFo
## 0          3       5489   58.35016       8900  0.9319372  108.69565
## 1          4       4214   59.26864        900  0.1363636   31.14754
## 2          1       4674   57.89669       1200  0.1500000   38.55422
## 3          2       3916   58.67546       1800  0.2926829   26.27119
## 4          5       4686   62.84871       4000  0.5797101   18.43972
## 5          2       4620   58.34070       1100  0.1456954   48.92086
##   MeanHouseP NoProperti NoHousehol PctDetache PctSemiDet PctTerrace
## 0     164000         60       4753        3.9        7.2       22.6
## 1     173500         73       4045        3.8       18.2       63.8
## 2     178995        147       4378        4.2       24.7       52.0
## 3     210000         87       4062        3.8       29.6       32.2
## 4     203500         84       3977        3.2       32.6       45.5
## 5     187875         75       4321        5.0       21.0       52.5
##   PctFlatMai PctOwned20 PctSocialR PctPrivate PctCTaxBan PctCTaxB_1
## 0       66.3       32.7       26.7       38.6   45.77158   56.47341
## 1       14.2       45.1       36.8       15.9    9.29495   90.75384
## 2       19.1       46.7       29.4       20.1   11.94346   90.76561
## 3       34.5       54.0       32.0       12.4   33.28380   66.36949
## 4       18.4       67.6       20.0       11.2   14.81574   84.93526
## 5       21.5       44.0       37.4       17.3   18.72262   81.41573
##   PctCTaxB_2 Incapacity IncomeSupp EmpSupport JSAClaiman PctDepChil
## 0 0.10897995   1.570681   3.612565   3.403141   8.737771   28.15789
## 1 0.07318858   3.257576   6.590909   6.060606   9.473789   35.55556
## 2 0.09422850   2.312500   4.750000   5.500000  10.234388   32.95775
## 3 0.44576523   2.682927   5.121951   4.634146   8.342021   28.79310
## 4 0.54780876   1.956522   3.188406   3.623188   8.293168   24.71698
## 5 0.04611483   2.450331   5.033113   5.099338   9.463746   30.14286
##   PctHHNoAdu PctLonePar IDRankLond IDPctWorst AvgGCSE201 UnauthAbse
## 0   8.748906   55.31062        166   85.71429        330        1.3
## 1  12.440191   60.99518        124  100.00000        341        1.3
## 2  10.731821   51.51057        185  100.00000        346        1.4
## 3  10.147133   55.23979         96  100.00000        327        1.7
## 4   6.663236   48.69792        288  100.00000        349        1.0
## 5  11.118914   52.45902        135  100.00000        339        1.4
##   PctWithNoQ PctLev4Qua CrimeRate1 ViolenceRa RobberyRat TheftAndHa
## 0       16.4       34.5      164.4   35.71429   6.541353   71.35338
## 1       31.2       16.7       83.9   22.88462   3.365385   23.46154
## 2       28.0       20.6       85.7   17.73109   2.605042   30.25210
## 3       29.1       19.5       84.5   18.62069   2.758621   28.57143
## 4       29.9       18.5       64.9   17.74648   2.159624   18.87324
## 5       28.9       20.0       75.9   17.87234   3.489362   22.29787
##   CriminalDa DrugsRate1 Deliberate PctOpenSpa CarsPerHH2 AvgPubTran
## 0  11.954887  14.060150        0.6   19.60720  0.5476815        5.7
## 1  10.000000   6.442308        0.3   22.41290  0.8151599        3.2
## 2   5.882353   6.302521        0.7    3.03888  0.8702361        2.9
## 3  10.738916   3.349754        1.1   56.40730  0.9180619        2.2
## 4   5.915493   4.037559        1.0   51.11650  1.0604579        2.4
## 5   7.744681   4.765957        1.3   18.10540  0.7827715        2.8
##   PctTTWBike TurnoutMay Shape_Leng Shape_Area ID      x      y
## 0  0.8016032   25.68894   6244.870    1282926  1 544204 184358
## 1  1.0204082   20.34793   6353.916    1364441  2 549062 185153
## 2  1.6474112   22.53821   6341.656    1288085  3 547000 186088
## 3  1.1746680   25.31881   9603.344    3384198  4 548360 189491
## 4  1.5578318   24.12147   8987.676    3450578  5 550790 186101
## 5  1.5151515   21.51488   6829.478    1440028  6 546139 183989
##   BluePlaqueCount  BlueDensity      BLocI    BLocIz     BLocIR   BLocIRz
## 0               1 7.794681e-07 0.05679283 0.1567154 0.08893069 0.2274243
## 1               0 0.000000e+00 0.08333988 0.2118224 0.13238172 0.3124398
## 2               0 0.000000e+00 0.06818055 0.1872611 0.10800337 0.2753202
## 3               0 0.000000e+00 0.07387441 0.2025339 0.11771065 0.2996974
## 4               0 0.000000e+00 0.07387441 0.2025339 0.11771065 0.2996974
## 5               0 0.000000e+00 0.06818055 0.1872611 0.11213389 0.2856929
##     BLocGiRz
## 0 -0.8364891
## 1 -0.8720954
## 2 -0.7679474
## 3 -0.8368497
## 4 -0.8368497
## 5 -0.7972659
I_LWard_Local_GCSE <- localmoran(LondonWards@data$AvgGCSE201, Lward.lw)
LondonWards@data$GCSE_LocIz <- I_LWard_Local_GCSE[,4]

tm_shape(LondonWards) +
    tm_polygons("GCSE_LocIz",
        style="fixed",
        breaks=breaks1,
        palette=MoranColours,
        midpoint=NA,
        title="Local Moran's I, GCSE Scores")

Now the Gi* statistic to look at clusters of high and low scores…

Gi_LWard_Local_GCSE <- localG(LondonWards@data$AvgGCSE201, Lward.lw)
LondonWards@data$GCSE_LocGiz <- Gi_LWard_Local_GCSE

tm_shape(LondonWards) +
    tm_polygons("GCSE_LocGiz",
        style="fixed",
        breaks=breaks1,
        palette=GIColours,
        midpoint=NA,
        title="Gi*, GCSE Scores")

So this is the end of the practical. Hopefully you have learned a lot about the different methods we can employ to analyse patterns in spatial data.

This practical was deliberately designed as a walk through, but this may have given you ideas about where you could perhaps take these techniques in your coursework if this is something you wanted to explore further with different data or in different contexts.

Things to perhaps try as an extension…

  1. We have used sp objects in this practical (because I wrote it before sf became the defacto spatial data type in R). Can you convert some of this so it works withsf?
  2. Could you atomate any of the functions so that you could quickly produce maps of any of the variables in the LondonWards dataset?
  3. Could you get these outputs into a faceted ggplot2 map?